Comparison of the C4.5 and a Naive Bayes Classifier for the Prediction of Lung Cancer Survivability

نویسندگان

  • George Dimitoglou
  • James A. Adams
  • Carol M. Jim
چکیده

Numerous data mining techniques have been developed to extract information and identify patterns and predict trends from large data sets. In this study, two classification techniques, the J48 implementation of the C4.5 algorithm and a Naive Bayes classifier are applied to predict lung cancer survivability from an extensive data set with fifteen years of patient records. The purpose of the project is to verify the predictive effectiveness of the two techniques on real, historical data. Besides the performance outcome that renders J48 marginally better than the Naive Bayes technique, there is a detailed description of the data and the required pre-processing activities. The performance results confirm expectations while some of the issues that appeared during experimentation, underscore the value of having domain-specific understanding to leverage any domain-specific characteristics inherent in the data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Breast Cancer Survivability Rates For data collected from Saudi Arabia Registries

The application of data mining and machine learning in directing clinical research into possible hidden knowledge is becoming greatly influencial in cancer research. This research presents a comparison of three data mining classification models: multi-layer perceptron neural networks, C4.5 decision trees and Naive Bayes. The classification models are built for breast cancer survivability predic...

متن کامل

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...

متن کامل

In silico prediction of anticancer peptides by TRAINER tool

Cancer is one of the causes of death in the world. Several treatment methods exist against cancer cells such as radiotherapy and chemotherapy. Since traditional methods have side effects on normal cells and are expensive, identification and developing a new method to cancer therapy is very important. Antimicrobial peptides, present in a wide variety of organisms, such as plants, amphibians and ...

متن کامل

A comparison of stacking with MDTs to bagging, boosting, and other stacking methods

In this paper, we present an integration of the algorithm MLC4.5 for learning meta decision trees (MDTs) into the Weka data mining suite. MDTs are a method for combining multiple classifiers. Instead of giving a prediction, MDT leaves specify which classifier should be used to obtain a prediction. The algorithm is based on the C4.5 algorithm for learning ordinary decision trees. An extensive pe...

متن کامل

Extracting Predictor Variables to Construct Breast Cancer Survivability Model with Class Imbalance Problem

Application of data mining methods as a decision support system has a great benefit to predict survival of new patients. It also has a great potential for health researchers to investigate the relationship between risk factors and cancer survival. But due to the imbalanced nature of datasets associated with breast cancer survival, the accuracy of survival prognosis models is a challenging issue...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1206.1121  شماره 

صفحات  -

تاریخ انتشار 2012